discovery for RNA-seq sequencing count data
ith the significant cost reduction of the sequencing technology,
g gene expression pattern or detecting differentially expressed
sed on sequencing count data has been more and more popular in
l/medical research. The sequencing count data are normally
d using the next-generation sequencer such as the so-called
machine [Behjati and Tarpey, 2013; Forde and O’Toole, 2013;
, et al., 2014]. A next-generation sequencer generates short
s, which are normally 100 base pairs long or shorter. Such a short
is called a sequencing read. One of the major applications of the
ng count data is the transcriptome data for gene differential
n pattern discovery [Crowgey, et al., 2020; Goswami and
2020]. Therefore, the majority of the sequencing count data used
vering differentially expressed genes is called the RNA-seq count
st be noted that an individual sequencing read has no biological
The biological meaning of sequencing reads can be investigated
n they have been aligned or mapped to a reference genome. There
al packages for mapping or align collected sequencing reads to a
reference genome, such as BWA [Li and Durbin, 2009] and
Langmead, et al., 2009], etc. An active gene may attract a greater
of sequencing reads while an inactive gene may attract little or
uencing reads. Only after sequencing reads have been mapped
to a reference genome, it is then possible to assess whether a
more active and less active than other genes depending on the
f sequencing reads which hit the gene. The direct outcome of the
from sequencing reads to a reference genome is a sequencing
trix across genes and replicates. Only after such a matrix has been
d, discovering DEGs based on sequencing count data can then